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1 Noise reduction in a statistical approach to text categorization 
Yiming Yang 

July 1995 Proceedings of the 18th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: l g[pdf(895.10 KB) Additional Information: full citation , references , citings , index terms 



Bitext maps and alignment via pattern recognition 
I. Dan Melamed 

March 1999 Computational Linguistics, volume 25 issue l 

Full text available: = nf| 

"g|paf(1.63 MB) *W Additional Information: full citation , abstract , references 
Publisher Site 

Texts that are available in two languages (bitexts) are becoming more and more plentiful, 
both in private data warehouses and on publicly accessible sites on the World Wide Web. As 
with other kinds of data, the value of bitexts largely depends on the efficacy of the available 
data mining tools. The first step in extracting useful information from bitexts is to find 
corresponding words and/or text segment boundaries in their two halves (bitext maps).TW\s 
article advances the state o ... 

Word sense disambiguation using a second language monolingual corpus 
Ido Dagan, Alon Itai 

December 1994 Computational Linguistics, Volume 20 Issue 4 

Full text available: igjl 

Tg] pdt(Z.57 MB) t ut Additional Information: full citation , abstract , references 
Publisher Site 

This paper presents a new approach for resolving lexical ambiguities in one language using 
statistical data from a monolingual corpus of another language. This approach exploits the 
differences between mappings of words to senses in different languages. The paper 
concentrates on the problem of target word selection in machine translation, for which the 
approach is directly applicable. The presented algorithm identifies syntactic relations 
between words, using a source language parser, and maps t ... 
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Sergey Brin, James Davis, Hector Garcfa-Molina 

May 1995 ACM SIGMOD Record , Proceedings of the 1995 ACM SIGMOD international 
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conference on Management of data, Volume 24 issue 2 
Full text available* tS|pdf(1>51 MB) Additional Information: full citation , abstract , references , citings , index 
' terms 

In a digital library system, documents are available in digital form and therefore are more 
easily copied and their copyrights are more easily violated. This is a very serious problem, 
as it discourages owners of valuable information from sharing it with authorized users. There 
are two main philosophies for addressing this problem: prevention and detection. The 
former actually makes unauthorized use of documents difficult or impossible while the latter 
makes it easier to discover such activity. I ... 

Special issue on machine learning approaches to shallow parsing: Shallow parsing Q 
using noisy and non-stationary training material 
Miles Osborne 

March 2002 The Journal of Machine Learning Research, volume 2 

Full text available: ^| pdf(181.57 KB) Additional Information: full citation, abstract , references , index terms 

Shallow parsers are usually assumed to be trained on noise-free material, drawn from the 
same distribution as the testing material. However, when either the training set is noisy or 
else drawn from a different distributions, performance may be degraded. Using the parsed 
Wall Street Journal, we investigate the performance of four shallow parsers (maximum 
entropy, memory-based learning, N-grams and ensemble learning) trained using various 
types of artificially noisy material. ... 

Similarity in harder cases: sentencing for fraud Q 
Ruth Murbach, Eva Nonn 

August 1993 Proceedings of the fourth international conference on Artificial 
intelligence and law 

Full text available: 1 ^pdf(943.49 KB) Additional Information: full citation, abstract, references, citings, index 
^ terms 

We focus on one of the central concepts of case-based reasoning: similarity. In the field of 
sentencing, where the really decided cases are often on the harder side, similarity is 
multidimensional and depends less on formal rules than on various legitimate principles, 
objectives and factors which relate to the offender, the victim, the act and its social context. 
The paper presents our data base of empirically analysed cases of fraud and discusses two 
of the different phases completed to re ... 

Adaptive multilingual sentence boundary disambiguation Q 
David D. Palmer, Marti A. Hearst 

June 1997 Computational Linguistics, volume 23 issue 2 

Full text available: ^ rf| 

H| pcit(i.77 MB) y Zy Additional Information: full citation , abstract , references 
Publisher Site 

The sentence is a standard textual unit in natual language processing applications. In many 
language the punctuation mark that indicates the end-of-sentence boundary is ambiguous; 
thus the tokenizers of most NLP systems must be equipped with special sentence boundary 
recognition rules for every new text collection. As an alternative, this article presents an 
efficient, trainable system for sentence boundary disambiguation. The system, called Satz, 
makes simple estimates of the parts of speech of ... 

Fast detection of communication patterns in distributed executions Q 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: ^ pdf(4.21 MB) Additional Information: full citation , abstract , references , index terms 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 

9 Voice response systems Q 
D L. Lee, F H. Lochovsky 

December 1983 ACM Computing Surveys (CSUR), Volume 15 issue 4 

Full text available: f9 pdf(2.22 MB) Additional Information: full citation , references , index terms 



10 Special issue on using lar ge corpora: I: Generalized probabilistic LR parsing of natural Q 
language (Corpora) with unification-based grammars 

Ted Briscoe, John Carroll 

March 1993 Computational Linguistics, volume 19 issue l 

Full text available: ^ ifjj] 

lH- Pat(2,DZ MB) ^jy Additional Information: full citation , abstract , references 
Publisher Site 

We describe work toward the construction of a very wide-coverage probabilistic parsing 
system for natural language (NL), based on LR parsing techniques. The system is intended 
to rank the large number of syntactic analyses produced by NL grammars according to the 
frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully 
automatic procedure for constructing an LR parse table from a unification-based grammar 
formalism, and consider the suitability of alternative ... 

11 Discovering models of software processes from event-based data Q 
Jonathan E. Cook, Alexander L. Wolf 

July 1998 ACM Transactions on Software Engineering and Methodology (TOSEM), 

Volume 7 Issue 3 

Full text available: Pj pdf(369.76 KB) Additional Information: full citation , abstract, references , citings, index 
^ terms , review 

Many software process methods and tools presuppose the existence of a formal model of a 
process. Unfortunately, developing a formal model for an on-going, complex process can be 
difficult, costly, and error prone. This presents a practical barrier to the adoption of process 
technologies, which would be lowered by automated assistance in creating formal models. 
To this end, we have developed a data analysis technique that we term process discovery. 
Under this technique, data ... 

Keywords: Balboa, process discovery, software process, tools 

12 Technique for automatically correcting words in text Q 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), Volume 24 issue 4 

Full text available* f § pdf(6.23 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms , review 

Research aimed at correcting words in text has focused on three progressively more difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context- 
dependent work correction. In response to the first problem, efficient pattern-matching and 
n-gram analysis techniques have been developed for detecting strings that do not appear in 
a given word list. In response to the second problem, a variety of general and application- 
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specific spelling cor ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checking, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 

13 The Hearsay-ll Speech-Understanding System: Integrating Knowledge to Resolve 
Uncertainty 

Lee D. Erman, Frederick Hayes-Roth, Victor R. Lesser, D. Raj Reddy 
June 1980 ACM Computing Surveys (CSUR), volume 12 issue 2 

Full text available: Q pdf(3.83 MB) Additional Information: full citation , references , citings , index terms 



14 Translator writing systems Q 
Jerome Feldman, David Gries 

February 1968 Communications of the ACM, volume n issue 2 

Full text available: ^ pdf(4.47 MB) Additional Information: full citation , abstract , references , citings 

A critical review of recent efforts to automate the writing of translators of programming 
languages is presented. The formal study of syntax and its application to translator writing 
are discussed in Section II. Various approaches to automating the postsyntactic (semantic) 
aspects of translator writing are discussed in Section III, and several related topics in 
Section IV. 

Keywords: compiler compiler-compiler, generator, macroprocessor, meta-assembler, 
metacompiler, parser, semantics, syntactic analysis, syntax, syntax-directed, translator, 
translator writing system 

15 Session: A fast partial parse of natural language sentences using a connectionist Q 
method 

Caroline Lyon, Bob Dickerson 

March 1995 Proceedings of the seventh conference on European chapter of the 
Association for Computational Linguistics 

Full text available: , Bpdf(722.75 KB) 

JIT " Additional Information: full citation , abstract , references 

Publisher Site 

The pattern matching capabilities of neural networks can be used to locate syntactic 
constituents of natural language. This paper describes a fully automated hybrid system, 
using neural nets operating within a grammatic framework. It addresses the representation 
of language for connectionist processing, and describes methods of constraining the problem 
size. The function of the network is briefly explained, and results are given. 

16 Novelty and topic change: Domain-independent text segmentation using anisotropic Q 
diffusion and dynamic programming 

Xiang Ji, Hongyuan Zha 

July 2003 Proceedings of the 26th annual international ACM SIGIR conference on 
Research and development in informaion retrieval 

Full text available: ^ pdf(171.61 KB) Additional Information: full citation , abstract , references , index terms 

This paper presents a novel domain-independent text segmentation method, which identifies 
the boundaries of topic changes in long text documents and/or text streams. The method 
consists of three components: As a preprocessing step, we eliminate the document- 
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dependent stop words as well as the generic stop words before the sentence similarity is 
computed. This step assists in the discrimination of the sentence semantic information. Then 
the cohesion information of sentences in a document o ... 

Keywords: anisotropic diffusion, document-dependent stop words, dynamic programming, 
text segmentation 



17 Potpourri: Translation analysis and translation automation 

Pierre Isabelle, Marc Dymetman, George Foster, Jean-Marc Jutras, Elliott Macklovitch, Francois 
Perrault, Xiaobo Ren, Michel Simard 

October 1993 Proceedings of the 1993 conference of the Centre for Advanced Studies 
on Collaborative research: distributed computing - Volume 2 

Full text available: ^ pdf(1.12 MB) Additional Information: full citation , abstract , references 

We argue that the concept of translation analysis provides a suitable foundation for a new 
generation of translation support tools. We show that pre-existing translations can be 
analyzed into a structured translation memory and describe our TransSearch bilingual 
concordancing system, which allows translators to harness such a memory. We claim that 
translation analyzers can help detect translation errors in draft translations and we present 
the results of an experiment on the ... 

18 Computational learning theory: survey and selected bibliography 
Dana Angluin 

July 1992 Proceedings of the twenty-fourth annual ACM symposium on Theory of 
computing 

Full text available: ^| pdf(2.11 MB) Additional Information: full citation , references , citings , index terms , review 



Locally adaptive dimensionality reduction for indexing large time series databases 
Kaushik Chakrabarti, Eamonn Keogh, Sharad Mehrotra, Michael Pazzani 
June 2002 ACM Transactions on Database Systems (TODS), Volume 27 issue 2 

Full text available: ^pdf(1.48 MB) Additional Information: full citation , abstract , references , index terms 

Similarity search in large time series databases has attracted much research interest 
recently. It is a difficult problem because of the typically high dimensionality of the data. 
The most promising solutions involve performing dimensionality reduction on the data, then 
indexing the reduced data with a multidimensional index structure. Many dimensionality 
reduction techniques have been proposed, including Singular Value Decomposition (SVD), 
the Discrete Fourier transform (DFT), and the Discrete ... 

Keywords: Dimensionality reduction, indexing, time-series similarity retrieval 
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20 Learning from a consistently ignorant teacher 

Michael Frazier, Sally Goldman, Nina Mishra, Leonard Pitt 

July 1994 Proceedings of the seventh annual conference on Computational learning 
theory 

Full text available: «pdff1.39MB) Additional Information: full citation , abstract, references , dtioss, index 
^ terms 

One view of computational learning theory is that of a learner acquiring the knowledge of a 
teacher. We introduce a formal model of learning capturing the idea that teachers may have 
gaps in their knowledge. The goal of the learner is still to acquire the knowledge of the 
teacher, but now the learner must also identify the gaps. This is the notion of learning from 
a consistently ignorant teacher. We consider the impact of knowledge gaps on learning, for 
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